{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6.3 \u89e3\u6790\u7b80\u5355\u7684XML\u6570\u636e\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### \u95ee\u9898\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u4f60\u60f3\u4ece\u4e00\u4e2a\u7b80\u5355\u7684XML\u6587\u6863\u4e2d\u63d0\u53d6\u6570\u636e\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### \u89e3\u51b3\u65b9\u6848\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u53ef\u4ee5\u4f7f\u7528 xml.etree.ElementTree \u6a21\u5757\u4ece\u7b80\u5355\u7684XML\u6587\u6863\u4e2d\u63d0\u53d6\u6570\u636e\u3002\n\u4e3a\u4e86\u6f14\u793a\uff0c\u5047\u8bbe\u4f60\u60f3\u89e3\u6790Planet Python\u4e0a\u7684RSS\u6e90\u3002\u4e0b\u9762\u662f\u76f8\u5e94\u7684\u4ee3\u7801\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from urllib.request import urlopen\nfrom xml.etree.ElementTree import parse\n\n# Download the RSS feed and parse it\nu = urlopen('http://planet.python.org/rss20.xml')\ndoc = parse(u)\n\n# Extract and output tags of interest\nfor item in doc.iterfind('channel/item'):\n    title = item.findtext('title')\n    date = item.findtext('pubDate')\n    link = item.findtext('link')\n\n    print(title)\n    print(date)\n    print(link)\n    print()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u8fd0\u884c\u4e0a\u9762\u7684\u4ee3\u7801\uff0c\u8f93\u51fa\u7ed3\u679c\u7c7b\u4f3c\u8fd9\u6837\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "Steve Holden: Python for Data Analysis\nMon, 19 Nov 2012 02:13:51 +0000\nhttp://holdenweb.blogspot.com/2012/11/python-for-data-analysis.html\n\nVasudev Ram: The Python Data model (for v2 and v3)\nSun, 18 Nov 2012 22:06:47 +0000\nhttp://jugad2.blogspot.com/2012/11/the-python-data-model.html\n\nPython Diary: Been playing around with Object Databases\nSun, 18 Nov 2012 20:40:29 +0000\nhttp://www.pythondiary.com/blog/Nov.18,2012/been-...-object-databases.html\n\nVasudev Ram: Wakari, Scientific Python in the cloud\nSun, 18 Nov 2012 20:19:41 +0000\nhttp://jugad2.blogspot.com/2012/11/wakari-scientific-python-in-cloud.html\n\nJesse Jiryu Davis: Toro: synchronization primitives for Tornado coroutines\nSun, 18 Nov 2012 20:17:49 +0000\nhttp://feedproxy.google.com/~r/EmptysquarePython/~3/_DOZT2Kd0hQ/"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u5f88\u663e\u7136\uff0c\u5982\u679c\u4f60\u60f3\u505a\u8fdb\u4e00\u6b65\u7684\u5904\u7406\uff0c\u4f60\u9700\u8981\u66ff\u6362 print() \u8bed\u53e5\u6765\u5b8c\u6210\u5176\u4ed6\u6709\u8da3\u7684\u4e8b\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### \u8ba8\u8bba\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u5728\u5f88\u591a\u5e94\u7528\u7a0b\u5e8f\u4e2d\u5904\u7406XML\u7f16\u7801\u683c\u5f0f\u7684\u6570\u636e\u662f\u5f88\u5e38\u89c1\u7684\u3002\n\u4e0d\u4ec5\u56e0\u4e3aXML\u5728Internet\u4e0a\u9762\u5df2\u7ecf\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u6570\u636e\u4ea4\u6362\uff0c\n\u540c\u65f6\u5b83\u4e5f\u662f\u4e00\u79cd\u5b58\u50a8\u5e94\u7528\u7a0b\u5e8f\u6570\u636e\u7684\u5e38\u7528\u683c\u5f0f(\u6bd4\u5982\u5b57\u5904\u7406\uff0c\u97f3\u4e50\u5e93\u7b49)\u3002\n\u63a5\u4e0b\u6765\u7684\u8ba8\u8bba\u4f1a\u5148\u5047\u5b9a\u8bfb\u8005\u5df2\u7ecf\u5bf9XML\u57fa\u7840\u6bd4\u8f83\u719f\u6089\u4e86\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u5728\u5f88\u591a\u60c5\u51b5\u4e0b\uff0c\u5f53\u4f7f\u7528XML\u6765\u4ec5\u4ec5\u5b58\u50a8\u6570\u636e\u7684\u65f6\u5019\uff0c\u5bf9\u5e94\u7684\u6587\u6863\u7ed3\u6784\u975e\u5e38\u7d27\u51d1\u5e76\u4e14\u76f4\u89c2\u3002\n\u4f8b\u5982\uff0c\u4e0a\u9762\u4f8b\u5b50\u4e2d\u7684RSS\u8ba2\u9605\u6e90\u7c7b\u4f3c\u4e8e\u4e0b\u9762\u7684\u683c\u5f0f\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "<?xml version=\"1.0\"?>\n<rss version=\"2.0\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n    <channel>\n        <title>Planet Python</title>\n        <link>http://planet.python.org/</link>\n        <language>en</language>\n        <description>Planet Python - http://planet.python.org/</description>\n        <item>\n            <title>Steve Holden: Python for Data Analysis</title>\n            <guid>http://holdenweb.blogspot.com/...-data-analysis.html</guid>\n            <link>http://holdenweb.blogspot.com/...-data-analysis.html</link>\n            <description>...</description>\n            <pubDate>Mon, 19 Nov 2012 02:13:51 +0000</pubDate>\n        </item>\n        <item>\n            <title>Vasudev Ram: The Python Data model (for v2 and v3)</title>\n            <guid>http://jugad2.blogspot.com/...-data-model.html</guid>\n            <link>http://jugad2.blogspot.com/...-data-model.html</link>\n            <description>...</description>\n            <pubDate>Sun, 18 Nov 2012 22:06:47 +0000</pubDate>\n        </item>\n        <item>\n            <title>Python Diary: Been playing around with Object Databases</title>\n            <guid>http://www.pythondiary.com/...-object-databases.html</guid>\n            <link>http://www.pythondiary.com/...-object-databases.html</link>\n            <description>...</description>\n            <pubDate>Sun, 18 Nov 2012 20:40:29 +0000</pubDate>\n        </item>\n        ...\n    </channel>\n</rss>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "xml.etree.ElementTree.parse() \u51fd\u6570\u89e3\u6790\u6574\u4e2aXML\u6587\u6863\u5e76\u5c06\u5176\u8f6c\u6362\u6210\u4e00\u4e2a\u6587\u6863\u5bf9\u8c61\u3002\n\u7136\u540e\uff0c\u4f60\u5c31\u80fd\u4f7f\u7528 find() \u3001iterfind() \u548c findtext() \u7b49\u65b9\u6cd5\u6765\u641c\u7d22\u7279\u5b9a\u7684XML\u5143\u7d20\u4e86\u3002\n\u8fd9\u4e9b\u51fd\u6570\u7684\u53c2\u6570\u5c31\u662f\u67d0\u4e2a\u6307\u5b9a\u7684\u6807\u7b7e\u540d\uff0c\u4f8b\u5982 channel/item \u6216 title \u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u6bcf\u6b21\u6307\u5b9a\u67d0\u4e2a\u6807\u7b7e\u65f6\uff0c\u4f60\u9700\u8981\u904d\u5386\u6574\u4e2a\u6587\u6863\u7ed3\u6784\u3002\u6bcf\u6b21\u641c\u7d22\u64cd\u4f5c\u4f1a\u4ece\u4e00\u4e2a\u8d77\u59cb\u5143\u7d20\u5f00\u59cb\u8fdb\u884c\u3002\n\u540c\u6837\uff0c\u6bcf\u6b21\u64cd\u4f5c\u6240\u6307\u5b9a\u7684\u6807\u7b7e\u540d\u4e5f\u662f\u8d77\u59cb\u5143\u7d20\u7684\u76f8\u5bf9\u8def\u5f84\u3002\n\u4f8b\u5982\uff0c\u6267\u884c doc.iterfind('channel/item') \u6765\u641c\u7d22\u6240\u6709\u5728 channel \u5143\u7d20\u4e0b\u9762\u7684 item \u5143\u7d20\u3002\ndoc \u4ee3\u8868\u6587\u6863\u7684\u6700\u9876\u5c42(\u4e5f\u5c31\u662f\u7b2c\u4e00\u7ea7\u7684 rss \u5143\u7d20)\u3002\n\u7136\u540e\u63a5\u4e0b\u6765\u7684\u8c03\u7528 item.findtext() \u4f1a\u4ece\u5df2\u627e\u5230\u7684 item \u5143\u7d20\u4f4d\u7f6e\u5f00\u59cb\u641c\u7d22\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "ElementTree \u6a21\u5757\u4e2d\u7684\u6bcf\u4e2a\u5143\u7d20\u6709\u4e00\u4e9b\u91cd\u8981\u7684\u5c5e\u6027\u548c\u65b9\u6cd5\uff0c\u5728\u89e3\u6790\u7684\u65f6\u5019\u975e\u5e38\u6709\u7528\u3002\ntag \u5c5e\u6027\u5305\u542b\u4e86\u6807\u7b7e\u7684\u540d\u5b57\uff0ctext \u5c5e\u6027\u5305\u542b\u4e86\u5185\u90e8\u7684\u6587\u672c\uff0c\u800c get() \u65b9\u6cd5\u80fd\u83b7\u53d6\u5c5e\u6027\u503c\u3002\u4f8b\u5982\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "doc"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "e = doc.find('channel/title')\ne"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "e.tag"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "e.text"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "e.get('some_attribute')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u6709\u4e00\u70b9\u8981\u5f3a\u8c03\u7684\u662f xml.etree.ElementTree \u5e76\u4e0d\u662fXML\u89e3\u6790\u7684\u552f\u4e00\u65b9\u6cd5\u3002\n\u5bf9\u4e8e\u66f4\u9ad8\u7ea7\u7684\u5e94\u7528\u7a0b\u5e8f\uff0c\u4f60\u9700\u8981\u8003\u8651\u4f7f\u7528 lxml \u3002\n\u5b83\u4f7f\u7528\u4e86\u548cElementTree\u540c\u6837\u7684\u7f16\u7a0b\u63a5\u53e3\uff0c\u56e0\u6b64\u4e0a\u9762\u7684\u4f8b\u5b50\u540c\u6837\u4e5f\u9002\u7528\u4e8elxml\u3002\n\u4f60\u53ea\u9700\u8981\u5c06\u521a\u5f00\u59cb\u7684import\u8bed\u53e5\u6362\u6210 from lxml.etree import parse \u5c31\u884c\u4e86\u3002\nlxml \u5b8c\u5168\u9075\u5faaXML\u6807\u51c6\uff0c\u5e76\u4e14\u901f\u5ea6\u4e5f\u975e\u5e38\u5feb\uff0c\u540c\u65f6\u8fd8\u652f\u6301\u9a8c\u8bc1\uff0cXSLT\uff0c\u548cXPath\u7b49\u7279\u6027\u3002"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.1"
    },
    "toc": {
      "base_numbering": 1,
      "nav_menu": {},
      "number_sections": true,
      "sideBar": true,
      "skip_h1_title": true,
      "title_cell": "Table of Contents",
      "title_sidebar": "Contents",
      "toc_cell": false,
      "toc_position": {},
      "toc_section_display": true,
      "toc_window_display": true
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}